In this document we learn how to create interactive charts with plotly. Simply put, we are learning how to transform tidy data into visually clear graphs. In the overall context of the workflow, this falls into the category of transforming our data into data visualisation.
{{<expand "Note: LinkedIn Learning videos" "...">}} There are references to LinkedIn Learning videos. These are complementary but not really required as the notes below are meant to be self-contained. Some students and staff would have access for free. Do not purchase access unless you are sure you don’t have access through your organisation already. {{</expand>}}
library("tidyverse")
library("plotly")
%>%) operator to create chartsLet’s say we have a nice graph created with ggplot2:
library("ggplot2")
load("tidy_ACORN-SAT_data/station_data.rdata")
temp_chart <- station_data %>%
filter(Station.name == "Sydney") %>%
ggplot(aes(x = year,
y = average.temp)) +
geom_point(color = "red") +
scale_x_discrete(breaks = seq(1910, 2010, by = 10)) +
ggtitle("Sydney's Average Yearly Temperature") +
xlab("Year") +
ylab("Average Temperature (°C)")
temp_chart
There is a useful command ggplotly() to automatically convert this to an interactive plotly widget:
ggplotly(temp_chart)
Try using your mouse to test the interactivity!
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## name = col_character(),
## colour = col_character(),
## height = col_double(),
## weight = col_double()
## )
For our examples we will use the same data from the Australian Environmental-Economic Accounts (2016), now including data from 2008-2014. The data relates to water consumption by state.
load(file = "tidy_EnvAcc_data/consumption.rdata")
consumption
## # A tibble: 48 x 3
## State year water_consumption
## <chr> <chr> <dbl>
## 1 NSW 2008–09 4555
## 2 VIC 2008–09 2951
## 3 QLD 2008–09 3341
## 4 SA 2008–09 1179
## 5 WA 2008–09 1361
## 6 TAS 2008–09 466
## 7 NT 2008–09 160
## 8 ACT 2008–09 48
## 9 NSW 2009–10 4323
## 10 VIC 2009–10 2904
## # … with 38 more rows
plot_ly()consumption %>%
plot_ly()
Already our chart is “interactive”: we can use our mouse to select specific areas of the empty graph, however we have no data.
add_trace() is a key plotly function that configures our graphsconsumption %>%
group_by(year) %>%
summarise(consumption_total = sum(water_consumption)) %>%
plot_ly() %>%
add_trace(x = ~year,
y = ~consumption_total,
type = "bar")
~”) when calling on columns from our dataWe have made a generic bar graph based on total water consumption per year, but what if we want a more advanced bar graph?
layout()Now we want a stacked bar chart, showing the breakdown of which state consumed water in a given year
barmode layout option to “stack”color argument of trace to colour stack sections by variableconsumption %>%
plot_ly() %>%
add_trace(x = ~year,
y = ~water_consumption,
type = "bar",
color = ~State) %>%
layout(barmode = "stack")
layout() argument barnormconsumption %>%
plot_ly() %>%
add_trace(x = ~year,
y = ~water_consumption,
type = "bar",
color = ~State) %>%
layout(barmode = "stack",
barnorm = "percent")
Mouse over the graph to see information!
consumption %>%
plot_ly() %>%
add_trace(x = ~water_consumption,
y = ~year,
type = "bar",
color = ~State) %>%
layout(barmode = "stack",
barnorm = "percent")
It is an exercise for the careful reader to deduce what part of the code was modified!
For our examples we will use data from the ABARES Agricultural Census of 2015-2016. The data relates to the average climate-adjusted productivity of all cropping farms between 1977 and 2015.
load("tidy_ABARES_data/farm_data.rdata")
head(farm_data, n=5)
## # A tibble: 5 x 4
## year Total.factor.productivity Climate.effect Climate.adjusted.TFP
## <chr> <dbl> <dbl> <dbl>
## 1 1978 95.9 89.7 103.
## 2 1979 113. 113. 102.
## 3 1980 112. 106. 103.
## 4 1981 84.2 92.5 101.
## 5 1982 104. 105. 101.
type to “scatter”mode argument to set to “markers”
farm_data %>%
plot_ly() %>%
add_trace(type = "scatter",
x = ~Climate.effect,
y = ~Total.factor.productivity,
mode = "markers",
color = ~Climate.adjusted.TFP)
Using the same data as before, let’s make some line charts.
modefarm_data %>%
plot_ly() %>%
add_trace(type = "scatter",
x = ~year,
y = ~Total.factor.productivity,
mode = "lines")
marker and line argumentscolor this wayfarm_data %>%
plot_ly() %>%
add_trace(type = "scatter",
x = ~year,
y = ~Total.factor.productivity,
mode = "lines",
marker = list(color = "Blue"),
line = list(color = "Red"))
farm_data %>%
plot_ly() %>%
add_trace(type = "scatter",
x = ~Climate.adjusted.TFP,
y = ~Total.factor.productivity,
mode = "markers",
size = ~Climate.effect)
titlelayout() options
xaxis and yaxis are layout optionstitle is set to whatever we liketext argument of add_trace()
paste() or paste0() to insert text and a variable~”)!farm_data %>%
plot_ly() %>%
add_trace(type = "scatter",
x = ~year,
y = ~Total.factor.productivity,
mode = "lines",
text = ~paste0("Climate effect is: ", Climate.effect)) %>%
layout(title = "Total Factor Productivity over Time",
xaxis = list(title = "Time"),
yaxis = list(title = "Total Factor Productivity"))
hoverinfo argument
text argument activefarm_data %>%
plot_ly() %>%
add_trace(type = "scatter",
x = ~year,
y = ~Total.factor.productivity,
mode = "lines",
text = ~paste0("Climate effect is: ", Climate.effect),
hoverinfo = c("text")) %>%
layout(title = "Total Factor Productivity over Time",
xaxis = list(title = "Time"),
yaxis = list(title = "Total Factor Productivity"))
We can plot several graphs which share one particular axis. In this example (using farm_data) we share the x-axis, but the y-axis is just as achievable.
variables = c("Total.factor.productivity",
"Climate.effect",
"Climate.adjusted.TFP")
var1)
as.formula(paste0("~", var1))name argument of add_trace() to tell the function the name of our variables. This helps lapply() decide what to put on our legend in the final plotplot1 <- function(var1) {
farm_data %>%
plot_ly() %>%
add_trace(type = "scatter",
x = ~year,
y = as.formula(paste0("~", var1)),
name = paste0(var1),
mode = "lines") %>%
layout(xaxis = list(title = "Year"),
yaxis = list(title = "Index"))
}
lapply() to create the same plot of all our variables at onceplots <- lapply(variables, plot1)
subplot(plots,
shareX = TRUE,
nrows = length(plots),
titleY = TRUE)
nrows to easily plot graphs next to each other, but this is visually weakerWe return to our ACORN-SAT data as an example
plot_geo() function instead of plot_ly()x to be longitude, and y to be latitudeload("tidy_ACORN-SAT_data/station_data.rdata")
station_data %>%
filter(year == 2000) %>%
plot_geo() %>%
add_trace(x = ~Longitude,
y = ~Latitude,
color = ~average.temp)
## No scattergeo mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode
Note that if we wish we may still add the argument size to our data. There is also an opacity option which takes a numeric value.
layout() function
geo argument, and once again we must format it as a listload("tidy_ACORN-SAT_data/station_data.rdata")
station_data %>%
filter(year == 2000) %>%
plot_geo() %>%
add_trace(x = ~Longitude,
y = ~Latitude,
color = ~average.temp) %>%
layout(geo = list(showlakes = TRUE,
showrivers = TRUE,
showcountries = TRUE))
## No scattergeo mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode
For our examples let us use a global surface temperature dataset obtained through Kaggle. We consider temperatures for just one day.
temperature_data <- read_csv("global_temp/GlobalLandTemperaturesByCountry.csv") %>%
filter(dt == "2000-01-01")
To make a choropleth is relatively simple, but not extensively customiseable 1. We pipe our data into plot_geo 2. We specify how we determine location with the locationmode argument of add_trace() - locationmode can take the value of “ISO-3”, “USA-states” or “country names” - The last option is usually the most useful - ISO-3 is a method which codes regions by 3 letter codes 3. We specify what column of our data contains our regional names with locations 4. We specify which variable we wish to colour by using the z argument
temperature_data %>%
plot_geo() %>%
add_trace(locationmode = "country names",
locations = ~Country,
z = ~AverageTemperature)
What happens if we don’t specify z?
z and simply specify that color refers to temperaturetemperature_data %>%
plot_geo() %>%
add_trace(locationmode = "country names",
locations = ~Country,
color = ~AverageTemperature)
## No scattergeo mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode
locationmode and locations to create scattergeo plots without longitudes and latitudes!mutate() and factor() to strictly order how we plot our variable-to-colour (here, State)consumption %>%
mutate(State = factor(State,
levels = c("NSW", "VIC", "QLD", "SA", "WA", "TAS", "NT", "ACT"))) %>%
plot_ly() %>%
add_trace(x = ~year,
y = ~water_consumption,
type = "bar",
color = ~State,
colors = c("Red", "Yellow", "Pink", "Green", "Purple", "Orange", "Blue", "Violet"))
color argument of add_trace()colors argument of add_trace()
colorRampPalette() to define a palettepalette <- colorRampPalette(c('yellow', 'red'))(length(temperature_data$AverageTemperature))
temperature_data %>%
plot_geo() %>%
add_trace(locationmode = "country names",
locations = ~Country,
z = ~AverageTemperature,
colors = palette)